The ALVIS Format for Linguistically Annotated Documents
نویسندگان
چکیده
The paper describes the ALVIS annotation format and discusses the problems that we encountered for the indexing of large collections of documents for topic specific search engines. This paper is exemplified on the biological domain and on MedLine abstracts, as developing a specialized search engine for biologist is one of the ALVIS case studies. The ALVIS principle for linguistic annotations is based on existing works and standard propositions. We made the choice of stand-off annotations rather than inserted mark-up, and annotations are encoded as XML elements which form the linguistic subsection of the document record.
منابع مشابه
A robust linguistic infrastructure for efficient web content analysis: the ALVIS project
This paper focuses on the design and the development of a text processing architecture exploiting specialized NLP tools, to produce linguistically annotated documents. This architecture is instanciated using existing NLP modules and resources which need to be tuned to specific domains. Taking as an example the biological domain, we show how a syntactic analyser can be adapted to this domain. We...
متن کاملA Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis
Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as th...
متن کاملEULIA: a graphical web interface for creating, browsing and editing linguistically annotated corpora
In this paper we present EULIA, a tool which has been designed for dealing with the linguistic annotated corpora generated by a set of different linguistic processing tools. The objective of EULIA is to provide a flexible and extensible environment for creating, consulting, visualizing, and modifying documents generated by existing linguistic tools. The documents used as input and output of the...
متن کاملOntology Learning and Semantic Annotation: a Necessary Symbiosis
Semantic annotation of text requires the dynamic merging of linguistically structured information and a “world model”, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping p...
متن کاملInteroperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
Natural language processing applications are frequently integrated to solve complex linguistic problems, but the lack of interoperability between these tools tends to be one of the main issues found in that process. That is often caused by the different linguistic formats used across the applications, which leads to attempts to both establish standard formats to represent linguistic information...
متن کامل